Power Tuning for HPC Jobs under Manufacturing Variations
نویسندگان
چکیده
As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop in 20MW, it is necessary that we use power intelligently to maximize performance under a power constraint. Most production-level parallel applications that run on a supercomputer are tightly-coupled parallel applications. A naive approach of enforcing a power constraint for a parallel job would be to divide the job’s power budget uniformly across all the processors. However, previous work has shown that a power capped job suffers from performance variation of the processors due to manufacturing variations leading to overall sub-optimal performance. We propose a 2-level hierarchical variation-aware approach of managing power at machine-level. At macro-level, PPartition partitions machine’s power budget across jobs to assign a power budget to each job running on the system such that the machine never exceeds its power budget. At micro-level, PTune makes jobcentric decisions by taking the performance variation into account. For every moldable job, it determines the optimal number of processors, the selection of processors and the distribution of the job’s power budget across them, with the goal of maximizing the job’s performance under its power budget. Our evaluations show that at micro-level, PTune achieves a performance improvement of up to 29% compared to the naive approach. PTune does not lead to any performance degradation, yet frees up almost 40% of the processors for the same performance as that of the näıve approach, under a hard power bound. PPartition is able to achieve a throughput improvement of 5-35% compared to uniform power distribution.
منابع مشابه
Self-tuning job scheduling strategies for the resource management of HPC systems and computational grids
In this thesis we develop and study self-tuning job schedulers for resource management systems. Such schedulers search for the best solution among the available scheduling alternatives in order to improve the performance of static schedulers. In two domains of real world job scheduling this concept is implemented. First of all, we study the scheduling in resource management software for high pe...
متن کاملBasic Issues in Identification Scheme of a Self-Tuning Power System Stabilizer
Power system stabilizers have been widely used and successfully implemented for the improvement of power system damping. However, a fixed parameter power system stabilizer tends to be sensitive to variations in generator dynamics so that, for operating conditions away from those used for design, the effectiveness of the stabilizer can be greatly impaired. With the advent of microprocessor techn...
متن کاملExploiting performance counters to predict and improve energy performance of HPC systems
Hardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate th...
متن کاملPower, Reliability, Performance: One System to Rule Them All
Traditionally, the emphasis of High Performance Computing (HPC) data centers and applications has been on performance. However, it is anticipated that future generation supercomputing systems will face major challenges in reliability, power management, and thermal variations. Disruptive solutions are required to optimize performance in the presence of these challenges. We believe that a smart p...
متن کاملCooperative Batch Scheduling for HPC Systems
The batch scheduler is an important system software serving as the interface between users and HPC systems. Users submit their jobs via batch scheduling portal and the batch scheduler makes scheduling decision for each job based on its request for computing sources, i.e. core-hours. However, jobs submitted to HPC systems are usually parallel applications and their lifecycle consists of multiple...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016